PXD002801 Gygi lab mouse liver re-analysis¶

Diversity Outbred (DO) Mice¶

Quality control and sanity check notebook¶

performed by Phil Wilmarth, PSR Core OHSU¶

November 5, 2025¶


  • Overview
  • Data loading
  • Removing plex effects with IRS
  • TMM normalization and final clustering
  • CV distributions
  • Sample-to-sample scatter plots
  • Summary

Overview¶

This is a 21-plex experiment using 10-plex TMT with no reference channels per plex. Each plex is a set of biological replicates of mouse livers from the diversity outbred mice (192 mice livers with some repeated samples). Half the mice were fed a high-fat diet, and the other half were fed standard chow. There were equal numbers of male and female mice (one each per cross). Samples were randomly assigned to plexes to try and balance diet and sex. Data were downloaded from PRIDE (PXD002801) and processed with the PAW Pipeline (Wilmarth 2009) and IRS normalized using the average of each plex as a mock reference channel. This is the publication:

Chick, J.M., Munger, S.C., Simecek, P., Huttlin, E.L., Choi, K., Gatti, D.M., Raghupathy, N., Svenson, K.L., Churchill, G.A. and Gygi, S.P., 2016. Defining the consequences of genetic variation on a proteome-wide scale. Nature, 534(7608), pp.500-505.

This QC notebook will address these questions:

  • Does internal reference scaling (IRS) (Plubell 2017) remove TMT plex "batch effects"?
  • Do the samples cluster by biological group?
  • Are normalization factors small and consistent?
  • Are samples within groups similar to each other?

The sample key is a bit much for a table here. This is a somewhat balanced study design, so the average intensities for each protein in each plex might work for IRS.

References¶

Wilmarth, P.A., Riviere, M.A. and David, L.L., 2009. Techniques for accurate protein identification in shotgun proteomic studies of human, mouse, bovine, and chicken lenses. Journal of ocular biology, diseases, and informatics, 2(4), pp.223-234.

Plubell, D.L., Wilmarth, P.A., Zhao, Y., Fenton, A.M., Minnier, J., Reddy, A.P., Klimek, J., Yang, X., David, L.L. and Pamir, N., 2017. Extended multiplexing of tandem mass tags (TMT) labeling reveals age and high fat diet specific proteome changes in mouse epididymal adipose tissue. Molecular & Cellular Proteomics, 16(5), pp.873-890.


This notebook examines quality control metrics¶


Load the necessary R libraries¶

In [1]:
# library imports
library(tidyverse)
library(scales)
library(limma)
library(edgeR)
library(psych)
── Attaching core tidyverse packages ──────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Attaching package: ‘scales’


The following object is masked from ‘package:purrr’:

    discard


The following object is masked from ‘package:readr’:

    col_factor



Attaching package: ‘psych’


The following objects are masked from ‘package:scales’:

    alpha, rescale


The following objects are masked from ‘package:ggplot2’:

    %+%, alpha


Define common functions for notebook use¶

In [2]:
# ================== TMM normalization from DGEList object =====================
apply_tmm_factors <- function(y, color = NULL, plot = TRUE) {
    # computes the tmm normalized data from the DGEList object
        # y - DGEList object
        # returns a dataframe with normalized intensities
    
    # compute and print "Sample loading" normalization factors
    lib_facs <- mean(y$samples$lib.size) / y$samples$lib.size
    cat("\nLibrary size factors:\n", 
        sprintf("%-5s -> %f\n", colnames(y$counts), lib_facs))
    
    # compute and print TMM normalization factors
    tmm_facs <- 1/y$samples$norm.factors
    cat("\nTrimmed mean of M-values (TMM) factors:\n", 
        sprintf("%-5s -> %f\n", colnames(y$counts), tmm_facs))
    
    # compute and print the final correction factors
    norm_facs <- lib_facs * tmm_facs
    cat("\nCombined (lib size and TMM) normalization factors:\n", 
        sprintf("%-5s -> %f\n", colnames(y$counts), norm_facs))

    # compute the normalized data as a new data frame
    tmt_tmm <- as.data.frame(sweep(y$counts, 2, norm_facs, FUN = "*"))
    colnames(tmt_tmm) <- str_c(colnames(y$counts), "_tmm")
    
    # visualize results and return data frame
    if(plot == TRUE) {
        boxplot(log10(tmt_tmm), col = color, notch = TRUE, main = "TMM Normalized data")
    }
    tmt_tmm
}

# ============== CV function ===================================================
CV <- function(df) {
    # Computes CVs of data frame rows
        # df - data frame, 
        # returns vector of CVs (%)
    ave <- rowMeans(df)    # compute averages
    sd <- apply(df, 1, sd) # compute standard deviations
    cv <- 100 * sd / ave   # compute CVs in percent (last thing gets returned)
}

# =========== Boxplot with median label ========================================
labeled_boxplot <- function(df, ylim, title) {
    # Makes a box plot with the median value labeled
        # df - data frame with data to compute CVs of
        # ylim - upper limit for y-axis
        # title - plot title
    cv = CV(df)
    boxplot(cv, ylim = c(0, ylim), notch = TRUE, main = title)
    text(x = 0.65, y = boxplot.stats(cv)$stats[3], 
         labels = round(boxplot.stats(cv)$stats[3], 1))
}

set_plot_dimensions <- function(width_choice, height_choice) {
    options(repr.plot.width=width_choice, repr.plot.height=height_choice)
}

Load the IRS-normalized TMT intensity data¶

The input data for the notebook was prepped in Excel from the IRS script output. The protein table excludes decoys, contaminants, some major blood proteins, and any proteins not seen in all 4 plexes.

In [3]:
# load the IRS-normalized data and check the table
data_irs <- read_tsv("quant_table.txt") 

# save gene names for edgeR so we can double check that results line up
accessions <- data_irs$Accession

# see how many rows of data we have
length(accessions)
Rows: 4804 Columns: 232
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr   (1): Accession
dbl (231): DO_1MH_1, DO_2MH_1, DO_3FH_1, DO_4MH_1, DO_5MS_1, DO_6FS_1, DO_7M...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
4804

Collect channels by biological conditions¶

The publication considered 4 groups by diet and sex. We will group the same way for this notebook.

In [4]:
# define the groups (after IRS)
FH_ <- select(data_irs, contains("FH_"))
cat("FH:", length(names(FH_)), "\n")
FS_ <- select(data_irs, contains("FS_"))
cat("FS:", length(names(FS_)), "\n")
MH_ <- select(data_irs, contains("MH_"))
cat("MH:", length(names(MH_)), "\n")
MS_ <- select(data_irs, contains("MS_"))
cat("MS:", length(names(MS_)), "\n")
FH: 50 
FS: 51 
MH: 58 
MS: 51 

Make the organized data frames¶

We will put the organized data into a new data frame, define some column indexes for each biological group, and set colors for plotting.

In [5]:
# put groups together into a single data frame
tmt_irs <- bind_cols(FH_, FS_, MH_, MS_)

# data.frame(col_num = 1:ncol(tmt_irs), variable = colnames(tmt_irs))
In [6]:
# define which columns go with each group
FH <- 1:50
FS <- 51:101
MH <- 102:159
MS <- 160:210

# set some colors by group
colors_group = c(rep('dark red', 50), rep('pink', 51), rep('dark blue', 58), rep('light blue', 51))

Clustering before and after IRS¶

We would expect to see clustering by plex before adjusting intensities with the IRS method. This experiment does not have as balanced samples per plex as the other smaller two experiments did. The plex batch-like effect should be removed by the IRS method and the clustering after IRS should be by biological groups. If the biological group clustering cannot be recovered, the mock reference channel approach may not be appropriate for this data.

In [7]:
# make default plot sizes a little bigger
set_plot_dimensions(9, 9)

# boxplots after IRS and before TMM
boxplot(log10(tmt_irs), col = colors_group, notch = TRUE, main = "After IRS (no TMM)")

# check clustering before IRS
plotMDS(log2(tmt_irs), main = "After IRS (no TMM)", col = colors_group)

Run TMM normalization and check final clustering¶

We will load the data into edgeR data structures and call the calcNormFactors function to perform library size and the trimmed mean of M-values (TMM) normalization to correct for any sample composition differences. We will double check if the TMM normalization changed the clustering that we had above.

Robinson, M.D. and Oshlack, A., 2010. A scaling normalization method for differential expression analysis of RNA-seq data. Genome biology, 11(3), p.R25.

We need to use the edgeR normalization factors to produce the TMM normalized data that the statistical testing will be working with. EdgeR uses the normalization factors in its statistical modeling but does not output the normalized intensities. We compute the normalized intensities with the apply_tmm_factors function.

In [8]:
# get the biological sample data into a DGEList object
group = c(rep("FH", length(FH)), rep("FS", length(FS)),
          rep("MH", length(MH)), rep("MS", length(MS)))

y <- DGEList(counts = tmt_irs, group = group, genes = accessions)

# run TMM normalization (also includes a library size factor)
y <- calcNormFactors(y)

# compute the TMM-normalized intensities
tmt_tmm <- apply_tmm_factors(y, color = colors_group)

# write the TMM-normalized intensities to file
write.table(cbind(accessions,tmt_tmm), "DO_IRS-plus-TMM.txt", sep = "\t", 
            row.names = FALSE, na =  " ")

# check the clustering
plotMDS(y, col = colors_group, main = "After IRS and TMM")
Library size factors:
 DO_3FH_1 -> 1.006862
 DO_12FH_2 -> 0.995895
 DO_14FH_2 -> 0.980331
 DO_17FH_2 -> 0.985833
 DO_26FH_3 -> 0.991650
 DO_38FH_5 -> 0.994687
 DO_40FH_5 -> 1.000440
 DO_42FH_5 -> 1.012346
 DO_47FH_6 -> 1.024000
 DO_49FH_6 -> 1.004147
 DO_55FH_6 -> 1.004504
 DO_58FH_7 -> 1.003737
 DO_60FH_7 -> 0.998770
 DO_61FH_7 -> 1.003351
 DO_68FH_8 -> 1.006505
 DO_69FH_8 -> 0.992929
 DO_75FH_9 -> 1.004702
 DO_78FH_9 -> 1.002452
 DO_83FH_10 -> 1.001996
 DO_87FH_10 -> 1.000672
 DO_88FH_10 -> 0.999882
 DO_90FH_10 -> 1.000161
 DO_94FH_11 -> 1.007224
 DO_95FH_11 -> 1.001026
 DO_96FH_11 -> 1.009083
 DO_100FH_11 -> 1.001989
 DO_104FH_12 -> 1.001875
 DO_106FH_12 -> 0.997668
 DO_108FH_12 -> 1.010022
 DO_114FH_13 -> 0.990233
 DO_120FH_14 -> 0.978154
 DO_121FH_14 -> 1.023491
 DO_133FH_15 -> 1.011545
 DO_140FH_16 -> 0.996985
 DO_141FH_16 -> 1.016052
 DO_142FH_16 -> 0.996990
 DO_143FH_16 -> 0.990118
 DO_147FH_16 -> 1.000110
 DO_149FH_16 -> 0.994751
 DO_150FH_16 -> 1.004654
 DO_154FH_17 -> 0.991000
 DO_159FH_17 -> 0.995069
 DO_170FH_18 -> 1.008897
 DO_183FH_20 -> 0.991863
 DO_184FH_20 -> 0.984431
 DO_42FH_21 -> 0.984866
 DO_145FH_21 -> 0.988509
 DO_191FH_21 -> 1.019752
 DO_12FH_21 -> 0.993941
 DO_194FH_21 -> 1.005758
 DO_6FS_1 -> 0.997116
 DO_9FS_1 -> 0.990961
 DO_10FS_1 -> 0.992491
 DO_18FS_2 -> 0.995832
 DO_19FS_2 -> 1.000139
 DO_21FS_3 -> 0.998206
 DO_22FS_3 -> 0.995127
 DO_23FS_3 -> 0.992175
 DO_32FS_4 -> 0.995651
 DO_33FS_4 -> 1.003990
 DO_34FS_4 -> 0.997206
 DO_37FS_4 -> 1.005611
 DO_41FS_5 -> 0.995850
 DO_44FS_5 -> 0.999926
 DO_45FS_5 -> 1.004522
 DO_51FS_6 -> 0.996750
 DO_53FS_6 -> 0.996751
 DO_54FS_6 -> 0.995179
 DO_59FS_7 -> 0.992742
 DO_62FS_7 -> 1.003516
 DO_64FS_7 -> 1.010183
 DO_66FS_8 -> 0.997526
 DO_72FS_8 -> 0.993821
 DO_74FS_9 -> 0.999856
 DO_77FS_9 -> 0.994943
 DO_79FS_9 -> 1.002702
 DO_85FS_10 -> 0.997301
 DO_93FS_11 -> 1.005725
 DO_99FS_11 -> 1.003018
 DO_101FS_12 -> 0.994057
 DO_105FS_12 -> 0.999136
 DO_112FS_13 -> 0.991125
 DO_116FS_13 -> 0.992739
 DO_122FS_14 -> 1.004199
 DO_128FS_14 -> 1.008352
 DO_130FS_15 -> 1.004014
 DO_131FS_15 -> 1.012238
 DO_132FS_15 -> 0.993844
 DO_146FS_16 -> 0.998074
 DO_151FS_17 -> 1.001838
 DO_152FS_17 -> 1.000045
 DO_153FS_17 -> 1.004869
 DO_161FS_18 -> 0.983602
 DO_163FS_18 -> 1.039877
 DO_167FS_18 -> 0.997462
 DO_174FS_19 -> 0.985897
 DO_179FS_19 -> 1.000652
 DO_180FS_19 -> 0.999922
 DO_188FS_20 -> 0.987578
 DO_62FS_21 -> 1.011947
 DO_193FS_21 -> 0.994444
 DO_1MH_1 -> 0.993601
 DO_2MH_1 -> 1.007922
 DO_4MH_1 -> 1.016413
 DO_7MH_1 -> 1.005489
 DO_16MH_2 -> 1.009300
 DO_4MH_2 -> 1.000929
 DO_24MH_3 -> 1.004842
 DO_25MH_3 -> 1.025246
 DO_27MH_3 -> 1.010156
 DO_4MH_3 -> 0.983766
 DO_31MH_4 -> 1.009047
 DO_35MH_4 -> 1.002238
 DO_4MH_4 -> 0.993296
 DO_39MH_5 -> 0.988816
 DO_43MH_5 -> 1.014687
 DO_4MH_5 -> 0.976842
 DO_50MH_6 -> 1.016051
 DO_4MH_6 -> 0.985780
 DO_57MH_7 -> 0.996428
 DO_63MH_7 -> 0.995516
 DO_4MH_7 -> 1.001048
 DO_67MH_8 -> 0.999535
 DO_71MH_8 -> 0.995276
 DO_4MH_8 -> 0.989023
 DO_82MH_9 -> 1.000436
 DO_4MH_9 -> 0.994999
 DO_84MH_10 -> 1.002614
 DO_86MH_10 -> 0.999922
 DO_89MH_10 -> 1.001183
 DO_4MH_10 -> 1.000843
 DO_97MH_11 -> 0.990699
 DO_4MH_11 -> 0.988826
 DO_103MH_12 -> 0.998726
 DO_107MH_12 -> 1.002153
 DO_4MH_12 -> 0.994317
 DO_110MH_13 -> 1.000987
 DO_111MH_13 -> 1.000765
 DO_117MH_13 -> 1.003283
 DO_119MH_13 -> 1.008566
 DO_125MH_14 -> 1.002970
 DO_129MH_14 -> 0.991356
 DO_134MH_15 -> 0.988269
 DO_138MH_15 -> 1.004762
 DO_148MH_16 -> 1.009571
 DO_162MH_18 -> 0.977130
 DO_165MH_18 -> 0.998380
 DO_168MH_18 -> 1.007174
 DO_169MH_18 -> 1.028101
 DO_172MH_19 -> 1.015569
 DO_175MH_19 -> 0.996073
 DO_178MH_19 -> 0.989610
 DO_181MH_20 -> 0.994278
 DO_182MH_20 -> 1.007589
 DO_185MH_20 -> 1.007334
 DO_186MH_20 -> 1.005991
 DO_189MH_20 -> 1.010115
 DO_4MH_21 -> 1.007476
 DO_192MH_21 -> 1.011334
 DO_5MS_1 -> 0.993441
 DO_8MS_1 -> 0.999286
 DO_11MS_2 -> 1.001607
 DO_13MS_2 -> 1.000930
 DO_15MS_2 -> 1.017289
 DO_20MS_3 -> 0.999724
 DO_28MS_3 -> 0.997886
 DO_29MS_4 -> 0.997317
 DO_30MS_4 -> 1.002272
 DO_36MS_4 -> 0.996318
 DO_46MS_5 -> 1.002634
 DO_48MS_6 -> 0.993973
 DO_52MS_6 -> 0.990713
 DO_56MS_7 -> 0.994381
 DO_65MS_8 -> 1.006982
 DO_70MS_8 -> 1.005026
 DO_73MS_8 -> 1.015562
 DO_76MS_9 -> 1.001115
 DO_80MS_9 -> 1.004105
 DO_81MS_9 -> 0.995988
 DO_91MS_10 -> 0.997776
 DO_92MS_11 -> 0.992305
 DO_98MS_11 -> 0.995183
 DO_102MS_12 -> 0.994520
 DO_109MS_12 -> 1.005589
 DO_113MS_13 -> 1.011214
 DO_115MS_13 -> 1.003191
 DO_118MS_13 -> 1.008615
 DO_123MS_14 -> 0.998825
 DO_124MS_14 -> 1.001904
 DO_126MS_14 -> 0.996437
 DO_127MS_14 -> 0.994663
 DO_135MS_15 -> 0.990141
 DO_136MS_15 -> 1.000797
 DO_137MS_15 -> 0.978433
 DO_139MS_15 -> 1.007136
 DO_144MS_16 -> 0.996668
 DO_155MS_17 -> 0.997987
 DO_156MS_17 -> 1.004332
 DO_157MS_17 -> 1.007964
 DO_158MS_17 -> 0.996084
 DO_160MS_17 -> 1.007634
 DO_164MS_18 -> 0.977132
 DO_166MS_18 -> 0.985102
 DO_171MS_19 -> 1.012180
 DO_173MS_19 -> 1.004120
 DO_176MS_19 -> 1.001449
 DO_177MS_19 -> 1.000497
 DO_187MS_20 -> 0.995924
 DO_190MS_20 -> 1.014378
 DO_52MS_21 -> 0.987854

Trimmed mean of M-values (TMM) factors:
 DO_3FH_1 -> 0.984637
 DO_12FH_2 -> 0.970631
 DO_14FH_2 -> 0.967783
 DO_17FH_2 -> 1.027854
 DO_26FH_3 -> 1.036544
 DO_38FH_5 -> 1.014480
 DO_40FH_5 -> 1.034588
 DO_42FH_5 -> 0.966728
 DO_47FH_6 -> 0.908890
 DO_49FH_6 -> 1.004842
 DO_55FH_6 -> 1.001417
 DO_58FH_7 -> 1.003026
 DO_60FH_7 -> 0.964322
 DO_61FH_7 -> 0.979063
 DO_68FH_8 -> 1.064563
 DO_69FH_8 -> 0.984809
 DO_75FH_9 -> 1.023070
 DO_78FH_9 -> 0.977882
 DO_83FH_10 -> 0.965629
 DO_87FH_10 -> 0.991698
 DO_88FH_10 -> 1.012273
 DO_90FH_10 -> 1.018005
 DO_94FH_11 -> 0.979385
 DO_95FH_11 -> 0.979260
 DO_96FH_11 -> 0.977134
 DO_100FH_11 -> 0.979838
 DO_104FH_12 -> 0.946883
 DO_106FH_12 -> 0.956345
 DO_108FH_12 -> 1.013455
 DO_114FH_13 -> 0.994230
 DO_120FH_14 -> 1.006590
 DO_121FH_14 -> 0.985023
 DO_133FH_15 -> 0.956319
 DO_140FH_16 -> 1.014897
 DO_141FH_16 -> 0.928495
 DO_142FH_16 -> 1.004525
 DO_143FH_16 -> 1.002125
 DO_147FH_16 -> 0.986917
 DO_149FH_16 -> 0.999223
 DO_150FH_16 -> 1.014697
 DO_154FH_17 -> 1.021475
 DO_159FH_17 -> 1.014649
 DO_170FH_18 -> 0.953876
 DO_183FH_20 -> 1.034699
 DO_184FH_20 -> 1.009735
 DO_42FH_21 -> 0.981690
 DO_145FH_21 -> 1.029946
 DO_191FH_21 -> 1.005099
 DO_12FH_21 -> 0.992890
 DO_194FH_21 -> 0.940080
 DO_6FS_1 -> 1.033892
 DO_9FS_1 -> 0.963997
 DO_10FS_1 -> 1.009481
 DO_18FS_2 -> 0.989701
 DO_19FS_2 -> 1.036582
 DO_21FS_3 -> 1.022705
 DO_22FS_3 -> 1.063323
 DO_23FS_3 -> 1.063819
 DO_32FS_4 -> 0.995780
 DO_33FS_4 -> 0.991048
 DO_34FS_4 -> 1.041301
 DO_37FS_4 -> 0.972501
 DO_41FS_5 -> 1.031427
 DO_44FS_5 -> 0.985643
 DO_45FS_5 -> 1.000516
 DO_51FS_6 -> 1.010010
 DO_53FS_6 -> 1.013910
 DO_54FS_6 -> 1.026980
 DO_59FS_7 -> 0.967873
 DO_62FS_7 -> 1.067026
 DO_64FS_7 -> 1.076214
 DO_66FS_8 -> 1.029944
 DO_72FS_8 -> 1.017481
 DO_74FS_9 -> 0.981226
 DO_77FS_9 -> 1.019004
 DO_79FS_9 -> 0.965211
 DO_85FS_10 -> 1.003304
 DO_93FS_11 -> 0.973536
 DO_99FS_11 -> 1.025248
 DO_101FS_12 -> 1.020993
 DO_105FS_12 -> 1.011515
 DO_112FS_13 -> 1.023311
 DO_116FS_13 -> 1.013065
 DO_122FS_14 -> 1.014019
 DO_128FS_14 -> 0.954047
 DO_130FS_15 -> 1.038281
 DO_131FS_15 -> 1.045136
 DO_132FS_15 -> 0.958723
 DO_146FS_16 -> 1.051391
 DO_151FS_17 -> 1.028580
 DO_152FS_17 -> 1.002740
 DO_153FS_17 -> 0.978550
 DO_161FS_18 -> 1.079662
 DO_163FS_18 -> 0.953603
 DO_167FS_18 -> 0.981736
 DO_174FS_19 -> 1.009430
 DO_179FS_19 -> 0.969062
 DO_180FS_19 -> 0.993433
 DO_188FS_20 -> 1.064875
 DO_62FS_21 -> 1.050789
 DO_193FS_21 -> 0.965976
 DO_1MH_1 -> 0.998195
 DO_2MH_1 -> 0.999468
 DO_4MH_1 -> 1.037362
 DO_7MH_1 -> 0.981022
 DO_16MH_2 -> 0.959746
 DO_4MH_2 -> 1.044157
 DO_24MH_3 -> 0.942848
 DO_25MH_3 -> 0.881993
 DO_27MH_3 -> 0.960091
 DO_4MH_3 -> 1.045901
 DO_31MH_4 -> 0.931341
 DO_35MH_4 -> 1.024665
 DO_4MH_4 -> 1.007464
 DO_39MH_5 -> 1.020780
 DO_43MH_5 -> 0.907584
 DO_4MH_5 -> 1.102616
 DO_50MH_6 -> 0.936426
 DO_4MH_6 -> 1.080914
 DO_57MH_7 -> 0.986396
 DO_63MH_7 -> 0.969432
 DO_4MH_7 -> 1.093863
 DO_67MH_8 -> 0.933502
 DO_71MH_8 -> 0.925085
 DO_4MH_8 -> 1.108611
 DO_82MH_9 -> 1.007587
 DO_4MH_9 -> 0.981486
 DO_84MH_10 -> 1.021023
 DO_86MH_10 -> 1.009195
 DO_89MH_10 -> 0.988643
 DO_4MH_10 -> 1.024197
 DO_97MH_11 -> 0.982032
 DO_4MH_11 -> 1.057607
 DO_103MH_12 -> 0.946842
 DO_107MH_12 -> 1.037748
 DO_4MH_12 -> 1.043064
 DO_110MH_13 -> 0.999426
 DO_111MH_13 -> 0.995160
 DO_117MH_13 -> 0.969681
 DO_119MH_13 -> 0.953735
 DO_125MH_14 -> 0.986235
 DO_129MH_14 -> 0.997761
 DO_134MH_15 -> 1.074754
 DO_138MH_15 -> 1.002864
 DO_148MH_16 -> 1.005994
 DO_162MH_18 -> 1.082478
 DO_165MH_18 -> 1.014983
 DO_168MH_18 -> 1.003529
 DO_169MH_18 -> 0.871072
 DO_172MH_19 -> 1.041104
 DO_175MH_19 -> 1.015518
 DO_178MH_19 -> 0.948769
 DO_181MH_20 -> 0.971105
 DO_182MH_20 -> 0.959945
 DO_185MH_20 -> 0.960447
 DO_186MH_20 -> 1.005621
 DO_189MH_20 -> 0.943917
 DO_4MH_21 -> 1.020574
 DO_192MH_21 -> 1.009096
 DO_5MS_1 -> 1.033740
 DO_8MS_1 -> 0.948307
 DO_11MS_2 -> 1.012971
 DO_13MS_2 -> 1.038838
 DO_15MS_2 -> 0.982412
 DO_20MS_3 -> 0.997720
 DO_28MS_3 -> 0.978608
 DO_29MS_4 -> 1.024003
 DO_30MS_4 -> 0.972469
 DO_36MS_4 -> 1.013167
 DO_46MS_5 -> 0.972493
 DO_48MS_6 -> 1.024515
 DO_52MS_6 -> 1.018990
 DO_56MS_7 -> 0.970391
 DO_65MS_8 -> 1.020042
 DO_70MS_8 -> 0.960172
 DO_73MS_8 -> 1.056735
 DO_76MS_9 -> 0.989547
 DO_80MS_9 -> 0.984221
 DO_81MS_9 -> 1.036821
 DO_91MS_10 -> 0.956703
 DO_92MS_11 -> 1.048526
 DO_98MS_11 -> 0.997970
 DO_102MS_12 -> 1.024650
 DO_109MS_12 -> 1.035777
 DO_113MS_13 -> 1.020010
 DO_115MS_13 -> 0.970438
 DO_118MS_13 -> 0.986983
 DO_123MS_14 -> 0.994689
 DO_124MS_14 -> 0.988876
 DO_126MS_14 -> 1.029977
 DO_127MS_14 -> 1.036642
 DO_135MS_15 -> 0.966275
 DO_136MS_15 -> 0.999988
 DO_137MS_15 -> 1.012264
 DO_139MS_15 -> 0.946462
 DO_144MS_16 -> 0.972595
 DO_155MS_17 -> 0.948379
 DO_156MS_17 -> 0.995356
 DO_157MS_17 -> 1.010819
 DO_158MS_17 -> 1.013181
 DO_160MS_17 -> 0.976844
 DO_164MS_18 -> 1.074727
 DO_166MS_18 -> 1.012637
 DO_171MS_19 -> 1.006474
 DO_173MS_19 -> 1.003007
 DO_176MS_19 -> 0.994819
 DO_177MS_19 -> 1.042915
 DO_187MS_20 -> 1.041614
 DO_190MS_20 -> 0.974922
 DO_52MS_21 -> 1.020963

Combined (lib size and TMM) normalization factors:
 DO_3FH_1 -> 0.991393
 DO_12FH_2 -> 0.966647
 DO_14FH_2 -> 0.948747
 DO_17FH_2 -> 1.013293
 DO_26FH_3 -> 1.027889
 DO_38FH_5 -> 1.009089
 DO_40FH_5 -> 1.035043
 DO_42FH_5 -> 0.978664
 DO_47FH_6 -> 0.930703
 DO_49FH_6 -> 1.009009
 DO_55FH_6 -> 1.005927
 DO_58FH_7 -> 1.006774
 DO_60FH_7 -> 0.963136
 DO_61FH_7 -> 0.982343
 DO_68FH_8 -> 1.071488
 DO_69FH_8 -> 0.977845
 DO_75FH_9 -> 1.027881
 DO_78FH_9 -> 0.980280
 DO_83FH_10 -> 0.967557
 DO_87FH_10 -> 0.992365
 DO_88FH_10 -> 1.012153
 DO_90FH_10 -> 1.018170
 DO_94FH_11 -> 0.986459
 DO_95FH_11 -> 0.980264
 DO_96FH_11 -> 0.986008
 DO_100FH_11 -> 0.981787
 DO_104FH_12 -> 0.948659
 DO_106FH_12 -> 0.954115
 DO_108FH_12 -> 1.023612
 DO_114FH_13 -> 0.984519
 DO_120FH_14 -> 0.984600
 DO_121FH_14 -> 1.008163
 DO_133FH_15 -> 0.967360
 DO_140FH_16 -> 1.011838
 DO_141FH_16 -> 0.943400
 DO_142FH_16 -> 1.001501
 DO_143FH_16 -> 0.992222
 DO_147FH_16 -> 0.987025
 DO_149FH_16 -> 0.993978
 DO_150FH_16 -> 1.019420
 DO_154FH_17 -> 1.012282
 DO_159FH_17 -> 1.009646
 DO_170FH_18 -> 0.962362
 DO_183FH_20 -> 1.026280
 DO_184FH_20 -> 0.994014
 DO_42FH_21 -> 0.966833
 DO_145FH_21 -> 1.018111
 DO_191FH_21 -> 1.024951
 DO_12FH_21 -> 0.986874
 DO_194FH_21 -> 0.945493
 DO_6FS_1 -> 1.030910
 DO_9FS_1 -> 0.955283
 DO_10FS_1 -> 1.001901
 DO_18FS_2 -> 0.985576
 DO_19FS_2 -> 1.036726
 DO_21FS_3 -> 1.020871
 DO_22FS_3 -> 1.058142
 DO_23FS_3 -> 1.055496
 DO_32FS_4 -> 0.991449
 DO_33FS_4 -> 0.995002
 DO_34FS_4 -> 1.038391
 DO_37FS_4 -> 0.977957
 DO_41FS_5 -> 1.027147
 DO_44FS_5 -> 0.985570
 DO_45FS_5 -> 1.005040
 DO_51FS_6 -> 1.006727
 DO_53FS_6 -> 1.010616
 DO_54FS_6 -> 1.022028
 DO_59FS_7 -> 0.960848
 DO_62FS_7 -> 1.070777
 DO_64FS_7 -> 1.087173
 DO_66FS_8 -> 1.027397
 DO_72FS_8 -> 1.011194
 DO_74FS_9 -> 0.981085
 DO_77FS_9 -> 1.013850
 DO_79FS_9 -> 0.967819
 DO_85FS_10 -> 1.000596
 DO_93FS_11 -> 0.979109
 DO_99FS_11 -> 1.028342
 DO_101FS_12 -> 1.014925
 DO_105FS_12 -> 1.010641
 DO_112FS_13 -> 1.014229
 DO_116FS_13 -> 1.005710
 DO_122FS_14 -> 1.018276
 DO_128FS_14 -> 0.962016
 DO_130FS_15 -> 1.042448
 DO_131FS_15 -> 1.057927
 DO_132FS_15 -> 0.952821
 DO_146FS_16 -> 1.049366
 DO_151FS_17 -> 1.030471
 DO_152FS_17 -> 1.002784
 DO_153FS_17 -> 0.983314
 DO_161FS_18 -> 1.061958
 DO_163FS_18 -> 0.991630
 DO_167FS_18 -> 0.979244
 DO_174FS_19 -> 0.995194
 DO_179FS_19 -> 0.969693
 DO_180FS_19 -> 0.993356
 DO_188FS_20 -> 1.051647
 DO_62FS_21 -> 1.063342
 DO_193FS_21 -> 0.960608
 DO_1MH_1 -> 0.991808
 DO_2MH_1 -> 1.007386
 DO_4MH_1 -> 1.054389
 DO_7MH_1 -> 0.986407
 DO_16MH_2 -> 0.968672
 DO_4MH_2 -> 1.045127
 DO_24MH_3 -> 0.947413
 DO_25MH_3 -> 0.904260
 DO_27MH_3 -> 0.969842
 DO_4MH_3 -> 1.028921
 DO_31MH_4 -> 0.939767
 DO_35MH_4 -> 1.026958
 DO_4MH_4 -> 1.000710
 DO_39MH_5 -> 1.009363
 DO_43MH_5 -> 0.920914
 DO_4MH_5 -> 1.077082
 DO_50MH_6 -> 0.951457
 DO_4MH_6 -> 1.065543
 DO_57MH_7 -> 0.982873
 DO_63MH_7 -> 0.965085
 DO_4MH_7 -> 1.095009
 DO_67MH_8 -> 0.933068
 DO_71MH_8 -> 0.920715
 DO_4MH_8 -> 1.096442
 DO_82MH_9 -> 1.008026
 DO_4MH_9 -> 0.976577
 DO_84MH_10 -> 1.023692
 DO_86MH_10 -> 1.009116
 DO_89MH_10 -> 0.989812
 DO_4MH_10 -> 1.025060
 DO_97MH_11 -> 0.972899
 DO_4MH_11 -> 1.045789
 DO_103MH_12 -> 0.945636
 DO_107MH_12 -> 1.039983
 DO_4MH_12 -> 1.037135
 DO_110MH_13 -> 1.000413
 DO_111MH_13 -> 0.995922
 DO_117MH_13 -> 0.972864
 DO_119MH_13 -> 0.961905
 DO_125MH_14 -> 0.989164
 DO_129MH_14 -> 0.989137
 DO_134MH_15 -> 1.062146
 DO_138MH_15 -> 1.007639
 DO_148MH_16 -> 1.015622
 DO_162MH_18 -> 1.057722
 DO_165MH_18 -> 1.013339
 DO_168MH_18 -> 1.010728
 DO_169MH_18 -> 0.895549
 DO_172MH_19 -> 1.057313
 DO_175MH_19 -> 1.011529
 DO_178MH_19 -> 0.938912
 DO_181MH_20 -> 0.965548
 DO_182MH_20 -> 0.967230
 DO_185MH_20 -> 0.967491
 DO_186MH_20 -> 1.011646
 DO_189MH_20 -> 0.953464
 DO_4MH_21 -> 1.028203
 DO_192MH_21 -> 1.020534
 DO_5MS_1 -> 1.026961
 DO_8MS_1 -> 0.947631
 DO_11MS_2 -> 1.014599
 DO_13MS_2 -> 1.039804
 DO_15MS_2 -> 0.999396
 DO_20MS_3 -> 0.997445
 DO_28MS_3 -> 0.976540
 DO_29MS_4 -> 1.021256
 DO_30MS_4 -> 0.974678
 DO_36MS_4 -> 1.009436
 DO_46MS_5 -> 0.975054
 DO_48MS_6 -> 1.018340
 DO_52MS_6 -> 1.009527
 DO_56MS_7 -> 0.964938
 DO_65MS_8 -> 1.027164
 DO_70MS_8 -> 0.964997
 DO_73MS_8 -> 1.073179
 DO_76MS_9 -> 0.990650
 DO_80MS_9 -> 0.988261
 DO_81MS_9 -> 1.032662
 DO_91MS_10 -> 0.954575
 DO_92MS_11 -> 1.040457
 DO_98MS_11 -> 0.993163
 DO_102MS_12 -> 1.019035
 DO_109MS_12 -> 1.041566
 DO_113MS_13 -> 1.031448
 DO_115MS_13 -> 0.973535
 DO_118MS_13 -> 0.995486
 DO_123MS_14 -> 0.993520
 DO_124MS_14 -> 0.990760
 DO_126MS_14 -> 1.026307
 DO_127MS_14 -> 1.031109
 DO_135MS_15 -> 0.956749
 DO_136MS_15 -> 1.000785
 DO_137MS_15 -> 0.990433
 DO_139MS_15 -> 0.953216
 DO_144MS_16 -> 0.969354
 DO_155MS_17 -> 0.946469
 DO_156MS_17 -> 0.999667
 DO_157MS_17 -> 1.018869
 DO_158MS_17 -> 1.009214
 DO_160MS_17 -> 0.984301
 DO_164MS_18 -> 1.050150
 DO_166MS_18 -> 0.997550
 DO_171MS_19 -> 1.018733
 DO_173MS_19 -> 1.007139
 DO_176MS_19 -> 0.996260
 DO_177MS_19 -> 1.043433
 DO_187MS_20 -> 1.037368
 DO_190MS_20 -> 0.988939
 DO_52MS_21 -> 1.008563

IRS and TMM seem to eliminate the plex effect¶

Boxplots and normalization factors suggest okay sample-to-sample similarity after IRS and TMM adjustments. The clustering looks biological (not by plex). We saw strong sex effects in the CC experiment, and we see that here with male samples on the right and female samples on the left. The diet choice shows top-to-bottom separation in both sexes, although the relative degree of variation is a lot larger for the x-axis than for the y-axis.

Check CV distributions¶

The distributions of Coefficients of Variation (CVs) are another way to get an idea of how individual proteins are behaving. This is an effective way to assess proper normalization in these experiments. We will compute CV distributions for each of the biological conditions.

In [9]:
# put CVs in data frames to simplify plots and summaries
# we will skip the n=1 group (TOFPS)
cv_tmm <- data.frame(FH = CV(tmt_tmm[FH]), FS = CV(tmt_tmm[FS]),
                     MH = CV(tmt_tmm[MH]), MS = CV(tmt_tmm[MS]))

medians <- apply(cv_tmm, 2, FUN = median)
print("Final median CVs by condition (%)")
round(medians, 2)
[1] "Final median CVs by condition (%)"
FH
15.09
FS
12.75
MH
16.32
MS
13.58

Average median CV is 14.4%¶

The average median CV in this experiment is higher than we had for the CC experiment or the founder experiment. This probably reflects that the plexes are not as well balanced as in the two smaller experiments. It might also reflect more biological variability in the larger cohort of crosses.

Use ggplot to visualize the CV distributions¶

We will look at the final IRS plus TMM normalized data here.

In [10]:
# see what the CV distibutions look like
# need long form for ggplot
long_cv_tmm <- gather(cv_tmm, key = "group", value = "cv") 

# traditional boxplots
ggplot(long_cv_tmm, aes(x = group, y = cv, fill = group)) +
  geom_boxplot(notch = TRUE) +
  ggtitle("Final CV distributions")

# density plots
ggplot(long_cv_tmm, aes(x = cv, color = group)) +
  geom_density() +
  coord_cartesian(xlim = c(0, 100)) +
  ggtitle("Final CV distributions")

The CV distributions are still narrow and similar¶

Even though the median CVs are a little higher, the shapes of the four CV distributions are very similar, and most CVs are relatively small (under 30%).

Compare samples within each biological condition¶

We will need to draw some random subsets of samples in each group to have manageable scatter plot grids (5 or 6 sample is about as many as we can do).

In [11]:
# multi-panel scatter plot grids, final data (random draws of 5 samples per group)
pairs.panels(log10(tmt_tmm[sample(FH, 5)]), lm = TRUE, main = "Female, high-fat diet - final")

pairs.panels(log10(tmt_tmm[sample(FS, 5)]), lm = TRUE, main = "Female, standard diet - final")

pairs.panels(log10(tmt_tmm[sample(MH, 5)]), lm = TRUE, main = "Male, high-fat diet - final")

pairs.panels(log10(tmt_tmm[sample(MS, 5)]), lm = TRUE, main = "Male, standard diet - final")

Sample-to-sample correlations are okay¶

The scatter plots in each group show a little more scatter than we saw in the two smaller experiments. This might reflect the increased biological diversity in this experiment.

Compare averages of each group to each other¶

We can get a little heads up on the statistical testing by comparing group averages to each other.

NOTE: Correlation coefficients are not very informative in the scatter plot grids. The liver samples from the mice co-vary with protein relative abundance, so the coefficients are always close to 1.0. The (subjective) visual impression of degree of scatter is a more useful metric.

In [12]:
# save the dataframe with the averages by developmental age
ave_tmm <- cbind(FH_ave = rowMeans(tmt_tmm[FH]),
                 FS_ave = rowMeans(tmt_tmm[FS]),
                 MH_ave = rowMeans(tmt_tmm[MH]),
                 MS_ave = rowMeans(tmt_tmm[MS]))

# make a scatter plot grid for the group averages
pairs.panels(log10(ave_tmm), lm = TRUE, main = "Group Averages - final")

Summary¶

The scatter plots of group averages indicate that the sex differences are quite a bit larger than the diet effect. The upper left and lower right scatter plots are tight to the diagonal trend line indicating a relatively small diet effect. Scatter plots between sexes, independent of diet, show many more proteins having larger abundance changes. Quality control checks seem okay. We can move on to any statistical testing.

Log the session information¶

In [13]:
sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: x86_64-apple-darwin20
Running under: macOS 15.7.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] psych_2.4.3     edgeR_4.2.0     limma_3.60.2    scales_1.3.0   
 [5] lubridate_1.9.3 forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4    
 [9] purrr_1.0.2     readr_2.1.5     tidyr_1.3.1     tibble_3.2.1   
[13] ggplot2_3.5.1   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] utf8_1.2.4        generics_0.1.3    stringi_1.8.4     lattice_0.22-6   
 [5] hms_1.1.3         digest_0.6.35     magrittr_2.0.3    evaluate_0.23    
 [9] grid_4.4.0        timechange_0.3.0  pbdZMQ_0.3-11     fastmap_1.2.0    
[13] jsonlite_1.8.8    fansi_1.0.6       mnormt_2.1.1      cli_3.6.2        
[17] rlang_1.1.3       crayon_1.5.2      bit64_4.0.5       munsell_0.5.1    
[21] base64enc_0.1-3   withr_3.0.0       repr_1.1.7        tools_4.4.0      
[25] parallel_4.4.0    tzdb_0.4.0        uuid_1.2-0        colorspace_2.1-0 
[29] locfit_1.5-9.9    IRdisplay_1.1     vctrs_0.6.5       R6_2.5.1         
[33] lifecycle_1.0.4   bit_4.0.5         vroom_1.6.5       pkgconfig_2.0.3  
[37] pillar_1.9.0      gtable_0.3.5      glue_1.7.0        Rcpp_1.0.12      
[41] statmod_1.5.0     tidyselect_1.2.1  IRkernel_1.3.2    farver_2.1.2     
[45] htmltools_0.5.8.1 nlme_3.1-164      labeling_0.4.3    compiler_4.4.0   
In [ ]: